Training Algorithms for Linear Text Classiiers

نویسندگان

  • David D. Lewis
  • Robert E. Schapire
  • James P. Callan
  • Ron Papka
چکیده

Systems for text retrieval, routing, categorization and other IR tasks rely heavily on linear classiiers. We propose that two machine learning algorithms, the Widrow-Hoo and EG algorithms, be used in training linear text classiiers. In contrast to most IR methods, theoretical analysis provides performance guarantees and guidance on parameter settings for these algorithms. Experimental data is presented showing Widrow-Hoo and EG to be more eeective than the widely used Rocchio algorithm on several categorization and routing tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Classifiers Are Large Margin Hyperplanes in a Hilbert Space

Bayesian algorithms for Neural Networks are known to produce classiiers which are very resistant to overrtting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a...

متن کامل

Text segmentation in mixed-mode images using classification trees and transform tree-structured vector quantization

Multimedia applications such as educational videos and color facsimile contain images that are rich in both tex-tual and continuous tone data. Because these two types of data have diierent properties, segmentation of the images into text and continuous tone data can improve compression by allowing diierent compression parameters or even algorithms to be employed on the diierent types. In this p...

متن کامل

Bayesian Classiiers Are Large Margin Hyperplanes in a Hilbert Space Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150

Bayesian algorithms for Neural Networks are known to produce clas-siiers which are very resistant to overrtting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output ...

متن کامل

Modify the linear search formula in the BFGS method to achieve global convergence.

<span style="color: #333333; font-family: Calibri, sans-serif; font-size: 13.3333px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: justify; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: #ffffff; text-dec...

متن کامل

A Sequential Algorithm for Training

The ability to cheaply train text classiiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classiiers was developed and tested on a newswire text categorization task. This method, which we call uncertaint...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996